Loading of Data. The data had been downloaded with an API from “Danmarkstatistik” into OpenRefine and cleaned. I will the rest of the modifying here in RStudio.

I load the data from the year 2007, of number of people, aged 18, who have moved to Copenhagen. This is then specifies in which municipality they come from. Then I check the first 6 rows to confirm it looks correct and to see the specifications of the colloum.

data07 <- read_csv("aar2007.csv",show_col_types = FALSE)

head(data07)
## # A tibble: 6 × 4
##     TID FRAKOMMUNE    ALDER INDHOLD
##   <dbl> <chr>         <chr>   <dbl>
## 1  2007 Koebenhavn    18 år       0
## 2  2007 Frederiksberg 18 år      63
## 3  2007 Dragoer       18 år      10
## 4  2007 Taernby       18 år      41
## 5  2007 Albertslund   18 år      10
## 6  2007 Ballerup      18 år      21

I here want to see how big a percentage of the combined municipality population they people who moves away constitute. I again use an API from Danmarkstatistik to find the total population of 18 year old from each municipality.

aar1808 <- read_csv("antal18.csv", show_col_types = FALSE)
head(aar1808)
## # A tibble: 6 × 2
##   OMRÅDE        INDHOLD
##   <chr>           <dbl>
## 1 Koebenhavn       4042
## 2 Frederiksberg     648
## 3 Dragoer           142
## 4 Taernby           471
## 5 Albertslund       435
## 6 Ballerup          541

I merge the two different datasets with the mutate function, as we can see below, the data07 set now has 5 variables

data07 %>% 
  mutate(Total18=aar1808$INDHOLD) -> data07

head(data07)
## # A tibble: 6 × 5
##     TID FRAKOMMUNE    ALDER INDHOLD Total18
##   <dbl> <chr>         <chr>   <dbl>   <dbl>
## 1  2007 Koebenhavn    18 år       0    4042
## 2  2007 Frederiksberg 18 år      63     648
## 3  2007 Dragoer       18 år      10     142
## 4  2007 Taernby       18 år      41     471
## 5  2007 Albertslund   18 år      10     435
## 6  2007 Ballerup      18 år      21     541
data07 %>% 
  mutate(procent = (INDHOLD/Total18)*100) -> data07

#The new colloum i add to the existing dataset since i want to keep working with it.

I use the “mutate” function because i wan’t to use data that already have included in my sheet. I use the mutate to make a new colloum that shows the promille of the population that moves away. I expect to find small numbers all over anyway.

Since i use a custom and very specific packages, the minicipalityDK and mapDK, they cannot read my data since the municipality names must match up 100%. I already cleaned up the special charracters in OpenRefine, but now i need to rename the colloums as well, and make them into lowercase.

#for 2007
data07 %>% 
  mutate(FRAKOMMUNE = tolower(FRAKOMMUNE)) -> data07

data07 %>% 
  rename(kommune = FRAKOMMUNE) -> data07

I here use mapDK to create a graph that shows the promille who moved. This is visuallied with dark blue as low values, and lighter is higher values.

kommunekort1 <- mapDK(values = 'procent', id = 'kommune', data = data07)
## Warning in mapDK(values = "procent", id = "kommune", data = data07): Some id not
## recognized: taernby
## Warning in mapDK(values = "procent", id = "kommune", data = data07): You
## provided no data for the following ids: taarnby
kommunekort1

Since i don’t have the data for christiansoe and taarnby i will get a warning since they can’t be included then. Also i have no value for Copenhagen, since you can’t move from and too the same.

I find that this isn’t very easy to read, no i will now try with the municipalityDK.

This way it becomes a little easier to read, and then the map is also interactive now, you can click on the municipalities and see the given value.

kommunekort2 <- municipalityDK("procent", "kommune", data = data07, legend=T,pal = "GnBu") %>%
  setMapWidgetStyle(list(background= "white"))
## Indlæser krævet pakke: sp
## Missing values for Christiansø
## Missing values for Tårnby
kommunekort2

I wan’t to make the contrast of colours even more clear and the map easier to use and understand.

kommunekort3 <- municipalityDK("procent", "kommune", data = data07, legend=T,pal = colfunc(10)) %>%
  setMapWidgetStyle(list(background= "white"))
## Missing values for Christiansø
## Missing values for Tårnby
kommunekort3

The contrast now is from blue to red, and therfor much easier to undersand and read.

I see that Frederiksberg i the municipality with the highest values. Now i want to examine if that has changed over time.

I use the Danmarkstatistik API again and load the data:

fre18 <- read_delim("https://api.statbank.dk/v1/data/FLY66/CSV?delimiter=Semicolon&TILKOMMUNE=101&FRAKOMMUNE=147&ALDER=18&Tid=*",show_col_types = FALSE)

I then want to do a simple plot with ggplot to see if this is normal values or an outlier.

ggplot(fre18) +  aes(x = TID, y = INDHOLD, colour = "red") + geom_path()

I can here see that the year 2008, isn’t an outlier and the fact that for Frederiksberg is the most popuplar place to move is Copenhagen, seems very likely.